Scientific Python antipatterns advent calendar day ten
For today, a kind of follow up to yesterday’s post. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.
If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:
and I’ll send a single email at the end with links to them all.
Using print and return for error handling
In yesterday’s post we looked at why we generally try to avoid returning a special value from a function if we don’t have to. But what if our function runs into a situation that it can’t handle? Imagine we have a tiny function that takes an email address, splits it into two parts - the username and the domain name - and returns just the domain name:
def get_domain(email_address):# split and take the last element of the split domain = email_address.split('@')[-1]return domainget_domain('martin@pythonforbiologists.com'), get_domain('billgates@microsoft.com')
('pythonforbiologists.com', 'microsoft.com')
How will this function handle incorrect inputs? Some will generate an error automatically:
get_domain(42)
---------------------------------------------------------------------------AttributeError Traceback (most recent call last)
CellIn[10], line 1----> 1get_domain(42)CellIn[9], line 4, in get_domain(email_address) 1defget_domain(email_address):
2 3# split and take the last element of the split----> 4 domain = email_address.split('@')[-1]
5return domain
AttributeError: 'int' object has no attribute 'split'
but some will give incorrect output:
get_domain('martin_at_pythonforbiologists.com')
'martin_at_pythonforbiologists.com'
We would like to avoid this, as it will cause problems later on. We can easily add some error checking:
result = get_domain('martin_at_pythonforbiologists.com')print(result)
None
Our function never hits a return, so we get the special value None. We will only notice this if we print the result; otherwise we might end up using it:
result = get_domain('martin_at_pythonforbiologists.com')result =='pythonforbiologists.com'
False
and never noticing that the function did not do anything.
Beginners often try to fix this by adding some debugging messages with print:
def get_domain(email_address):if'@'in email_address: domain = email_address.split('@')[-1]return domainelse:print('ERROR: email address must contain an @ character!')get_domain('martin_at_pythonforbiologists.com')
ERROR: email address must contain an @ character!
but this doesn’t stop us from accidentally storing the result and using it just like before:
result = get_domain('martin_at_pythonforbiologists.com')result =='pythonforbiologists.com'
ERROR: email address must contain an @ character!
False
we have to hope that we notice the printed error message. Even worse is to return the error message:
def get_domain(email_address):if'@'in email_address: domain = email_address.split('@')[-1]return domainelse:return('ERROR: email address must contain an @ character!')
Now if we use our function to process a batch of email addresses:
email_addresses = ['martin@pythonforbiologists.com','alice','bob@gmail.com','billg_at_microsoft.com']domains = []for email_address in email_addresses: domains.append(get_domain(email_address))
we are left with a list of strings that contains a mixture of valid domains and error messages:
domains
['pythonforbiologists.com',
'ERROR: email address must contain an @ character!',
'gmail.com',
'ERROR: email address must contain an @ character!']
which will be very hard to deal with.
So what’s the right way to deal with this? Use Python’s built in exception system to signal the error:
def get_domain(email_address):if'@'in email_address: domain = email_address.split('@')[-1]return domainelse:raiseValueError('ERROR: email address must contain an @ character!')
This makes no difference to the behaviour for valid inputs:
get_domain('martin@pythonforbiologists.com')
'pythonforbiologists.com'
but immediately triggers a crash on invalid inputs:
get_domain('martin_at_pythonforbiologists.com')
---------------------------------------------------------------------------ValueError Traceback (most recent call last)
CellIn[25], line 1----> 1get_domain('martin_at_pythonforbiologists.com')CellIn[23], line 6, in get_domain(email_address) 4return domain
5else:
----> 6raiseValueError('ERROR: email address must contain an @ character!')
ValueError: ERROR: email address must contain an @ character!
Now it is inpossible for us to accidentally use the return value, as the function never returns (it crashes instead). And it’s impossible for us to ignore the error! We now have a function that can only ever do one of two things:
return the correct output for a valid input
crash on an invalid input
and using the rest of Python’s exception system (which is too long an explanation for an advent calendar post!) we will easily be able to decide how to handle errors.
Bonus: by convention in Python - and because it’s generally clearer - we try to put error-checking code at the start of the function so that we can easily skip it when reading the code and trying to understand the expected behaviour:
def get_domain(email_address):if'@'notin email_address:raiseValueError('ERROR: email address must contain an @ character!') domain = email_address.split('@')[-1]return domain
This also avoids the need for a separate else brach, leaving the code even cleaner.
One more time; if you want to see the rest of these little write-ups, sign up for the mailing list: